Problems in Current Text Simplification Research: New Data Can Help

نویسندگان

  • Wei Xu
  • Chris Callison-Burch
  • Courtney Napoles
چکیده

Simple Wikipedia has dominated simplification research in the past 5 years. In this opinion paper, we argue that focusing on Wikipedia limits simplification research. We back up our arguments with corpus analysis and by highlighting statements that other researchers have made in the simplification literature. We introduce a new simplification dataset that is a significant improvement over Simple Wikipedia, and present a novel quantitative-comparative approach to study the quality of simplification data resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Automated Text Simplification

Text simplification modifies syntax and lexicon to improve the understandability of language for an end user. This survey identifies and classifies simplification research within the period 1998-2013. Simplification can be used for many applications, including: Second language learners, preprocessing in pipelines and assistive technology. There are many approaches to the simplification task, in...

متن کامل

Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs

Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data. While the recently introduced Newsela corpus has alleviated the first problem, simplifications still need to be...

متن کامل

Applicability improvement and hysteresis current control method simplification in shunt active filters

Hysteresis current control method is vastly used in PWM inverters because of simplicity in performance, fast control response and good ability in limiting peak current. However, switching frequency in hysteresis current control method with fixed bandwidth has large variation during a cycle and therefore causes non-optimal current ripple generation in output current. One of basic problems in imp...

متن کامل

Natural Language Processing for Improving Textual Accessibility ( NLP 4 ITA ) Workshop Programme

Analysis of long sentences are source of problems in advanced applications such as machine translation. With the aim of solving these problems in advanced applications, we have analysed long sentences of two corpora written in Standard Basque in order to make syntactic simplification. The result of this analysis leads us to design a proposal to produce shorter sentences out of long ones. In ord...

متن کامل

Text Simplification and Pupillometry: An Exploratory Study

Cognitive load is a major factor affecting user performance. Hence, a better understanding of cognitive load can help design better information systems. To achieve this goal, in this study we looked at the relationship between cognitive load and pupillary responses for a task that required people to either read a text passage from an actual website or read the simplified version of the same tex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TACL

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2015